Amit Shetty
Describe the objective of this assignment. You can briefly state how you accompilsh it.
The objective of this assignment is to understand what deep learning is. Deep learning is a branch of machine learning that forms the basis of artificial neural networks. To understand deep learning, I will be focussing on designing different types of neural neural networks. The process I will be following will be to collect the data and visualisae it to gain some key insights. This will allow me to decide what my input and target variables must be. I will be using keras (https://keras.io/) to implement the neural networks. Keras creates neural networks in the form of models which I will them apply to the data to get some predictons.
Introduce your data and visualize them. Describe your observations about the data. You can reuse the data that you examined in Assignment #0 (of course for classification).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones).
While the datasets are divied into two different wine types, the characters for calulcating the quality of wine still remains the same for both wines
Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol
Target variable (based on sensory data): 12 - quality (score between 0 and 10)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM, GlobalMaxPooling1D, SpatialDropout1D, Embedding, Dropout
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
import tensorflow
import time
%matplotlib inline
df_red = pd.read_csv("winequality_red.csv")
df_white = pd.read_csv("winequality_white.csv")
We will be adding a new column to our red and white wine datasets tom add a new color column to be able to differentiate between the wines when the datasets are merged
df_red["color"] = "R"
df_white["color"] = "W"
Merging the two red and white wine datasets together for eventual training and testing
df_all=pd.concat([df_red,df_white],axis=0)
df_all.head()
To avoid any issues withspaces when processing data, we will be renaming the data columns and replacing the spaces with _ characters
df_white.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_red.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_all.rename(columns={'fixed acidity': 'fixed_acidity','citric acid':'citric_acid','volatile acidity':'volatile_acidity','residual sugar':'residual_sugar','free sulfur dioxide':'free_sulfur_dioxide','total sulfur dioxide':'total_sulfur_dioxide'}, inplace=True)
df_all.head()
Dummy variables for modelling. other variables will be normalised when the models are created
df = pd.get_dummies(df_all, columns=["color"])
df
# Checking for nay null values
df_all.isnull().sum()
df_all.describe()
Plotting correlation matrices for both red and white wines individually and for the whole combination
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (Reds)")
corr = df_red.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap="Reds")
plt.show()
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (Reds)")
corr = df_white.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap = "Blues")
plt.show()
plt.subplots(figsize=(20,15))
ax = plt.axes()
ax.set_title("Wine Characteristic Correlation Heatmap (All wines)")
corr = df_all.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values, annot=True, cmap="Oranges")
plt.show()
Testing association between density of wine and sugar content of red, white and combined wine dataset
scat1 = sns.regplot(x = "density", y = "residual_sugar", fit_reg = True, color='r', data = df_red)
plt.xlabel("Density of wine")
plt.ylabel("Residual sugar in wine in grams")
plt.title("Association between RED wine's density and residual sugar")
plt.show()
scat1 = sns.regplot(x = "density", y = "residual_sugar", fit_reg = True, color='b', data = df_white)
plt.xlabel("Density of wine")
plt.ylabel("Residual sugar in wine in grams")
plt.title("Association between WHITE wine's density and residual sugar")
plt.show()
We will now be checking how the quality of wine is distributed for red and white wines and combined dataset
df_red["quality"] = pd.Categorical(df_red["quality"])
sns.countplot(x="quality", data=df_red)
plt.xlabel("Quality level of RED wine (0-10 scale)")
plt.show()
df_white["quality"] = pd.Categorical(df_white["quality"])
sns.countplot(x="quality", data=df_white)
plt.xlabel("Quality level of WHITE wine (0-10 scale)")
plt.show()
One of the key factors affecting the quality is the amount of alchohol in the wine
sns.factorplot(x="quality", y="alcohol", data=df_red, kind="strip")
plt.xlabel("Quality level of wine, 0-10 scale")
plt.ylabel("Alcohol level in wine in % ABV")
plt.title("Alcohol percent in each level of RED wine's quality")
plt.show()
sns.factorplot(x="quality", y="alcohol", data=df_white, kind="strip")
plt.xlabel("Quality level of wine, 0-10 scale")
plt.ylabel("Alcohol level in wine in % ABV")
plt.title("Alcohol percent in each level of WHITE wine's quality")
plt.show()
We will be seeing the distribution of volatile acidity over both red and white wine
redlabels = np.unique(df_red['quality'])
whitelabels = np.unique(df_white['quality'])
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
plt.title('Distribution of Voltile Acidity in Red and White Wine')
redcolors = np.random.rand(6,4)
whitecolors = np.append(redcolors, np.random.rand(1,4), axis=0)
for i in range(len(redcolors)):
redy = df_red['alcohol'][df_red.quality == redlabels[i]]
redx = df_red['volatile_acidity'][df_red.quality == redlabels[i]]
ax[0].scatter(redx, redy, c=redcolors[i])
for i in range(len(whitecolors)):
whitey = df_white['alcohol'][df_white.quality == whitelabels[i]]
whitex = df_white['volatile_acidity'][df_white.quality == whitelabels[i]]
ax[1].scatter(whitex, whitey, c=whitecolors[i])
ax[0].set_title("Red Wine")
ax[1].set_title("White Wine")
ax[0].set_xlim([0,1.7])
ax[1].set_xlim([0,1.7])
ax[0].set_ylim([5,15.5])
ax[1].set_ylim([5,15.5])
ax[0].set_xlabel("Volatile Acidity")
ax[0].set_ylabel("Alcohol")
ax[1].set_xlabel("Volatile Acidity")
ax[1].set_ylabel("Alcohol")
ax[1].legend(whitelabels, loc='best', bbox_to_anchor=(1.3, 1))
plt.show()
To summarise the data collection, plotting the correlation between all the different variables with the final quality
sns.pairplot(df_red, vars=df_red.columns[:-1])
plt.title("Pair plot showing RED wine charcteristics")
plt.show()
sns.pairplot(df_white, vars=df_white.columns[:-1])
plt.title("Pair plot showing WHITE wine charcteristics")
plt.show()
sns.pairplot(df_all, vars=df_all.columns[:-1])
plt.title("Pair plot showing ALL wine charcteristics")
plt.show()
We have two target variables in our dataset. Each serves different purposes. The 'quality' column determines form a scale from 0-10 which will be used in our regression modelling. The 'color' column has 2 unique categories 'Red' and 'White' whihc is the type of wine which meets the characteristics
The input variables will be the attributes that are determined by people's subjectives observations and certain known qualities of the wine.
From the correlation matrix we can see that citric acid, sulphates, fixed acidity and alcholhol content determine how good the red and white wine is.
The denisty ditribution shows that red wine is more dense than white wine which will have an obvious effect on its taste and corresponding attributes.
However despite the previous observation of red wine being richer than white wine, the quality distribution shows that white wine enjoys a more priveleged position
The scatter plot show variation of alchohol levels in white wine and red wine where red wine has a more scattered densoty distribution compared to white wine.
Refer: https://www.datacamp.com/community/tutorials/deep-learning-python
In this assignment, you are building a deep network with more than 5 layers using TensorFlow. Looking at the chart below, get some idea about how you can construct your networks for what problem and why you pick your structure.
Following images are only for you to get some idea. You do not necessarily stick with these. You can come up with your own structure or shape.
We now have to determine the input and the target variables will be. In our case, target variable will be the type of wine i.e. red or white depending on the remaining attributes in our dataset.
X = df_all.iloc[:, :11]
X.shape
T = df_all.iloc[:, 12:13]
T.shape
# Apply one hot encoder
ohe = OneHotEncoder()
T = ohe.fit_transform(T).toarray()
X_train,X_test,T_train,T_test = train_test_split(X,T,test_size = 0.2)
T
The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated in each node, and if the value is above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons with this kind of activation function are also called artificial neurons or linear threshold units. The goal of a feedforward network is to approximate some function f. For example, for a classifier, y = f(x) maps an input x to a category y. A feedforward network defines a mapping y = f(x;θ) and learns the value of the parameters θ that result in the best function approximation. These models are called feedforward because information flows through the function being evaluated from x, through the intermediate computations used to define f, and finally to the output y. There are no feedback connections in which outputs of the model are fed back into itself.
I will be using Keras to build the neural network model as it makes it easier to easily tweak the paramters needed to build each layer of the neural network
As I am building a basic neual network for my initial of the 3 models to build, I will be vuilding using a one input layer, two hidden layers and one output layer with 2 neurons which is the scope of the data at hand.
The input dimensions will be noted at 11 whihc is the attibutes that I am using as the deciding factor for the red and white wine classification.
The activation function used for the hidden layers is rectified linear units (ReLU) and the final activation function used for classifiction will be the softmanx function.
Keras provides an option to compile the model using epochs and batch sizes for muliple loads to be send for compilation.
We will also be checking how the model performs by issuing the test data as a validation set and compiling the model.
model = Sequential()
model.add(Dense(20, input_dim=11, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# history = model.fit(X_train, T_train, epochs=100, batch_size=64)
# T_pred = model.predict(X_test)
# #Converting predictions to label
# pred = list()
# for i in range(len(T_pred)):
# pred.append(np.argmax(T_pred[i]))
# #Converting one hot encoded test label to label
# test = list()
# for i in range(len(T_test)):
# test.append(np.argmax(T_test[i]))
# pred
# test
Applyting model by training and validating using test set
start_time_1 = time.time()
history = model.fit(X_train, T_train,validation_data = (X_test,T_test), epochs=100, batch_size=64)
end_time_1 = time.time()
print("Time taken to execute model 1 ==> {} seconds".format(end_time_1 - start_time_1))
A summary to show to give an overview of the model created and the number of trainable parameters among other info.
model.summary()
T_pred = model.predict(X_test)
#Converting predictions to label
pred = list()
for i in range(len(T_pred)):
pred.append(np.argmax(T_pred[i]))
#Converting one hot encoded test label to label
test = list()
for i in range(len(T_test)):
test.append(np.argmax(T_test[i]))
The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated in each node, and if the value is above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons with this kind of activation function are also called artificial neurons or linear threshold units. The goal of a feedforward network is to approximate some function f. For example, for a classifier, y = f(x) maps an input x to a category y. A feedforward network defines a mapping y = f(x;θ) and learns the value of the parameters θ that result in the best function approximation. These models are called feedforward because information flows through the function being evaluated from x, through the intermediate computations used to define f, and finally to the output y. There are no feedback connections in which outputs of the model are fed back into itself.
I will be using Keras to build the neural network model as it makes it easier to easily tweak the paramters needed to build each layer of the neural network
As I am building a basic neual network for my second of the 3 models to build, I will be vuilding using a 11 input layers followed by decresing number hidden layers and one output layer with 1 neurons which is the scope of the data at hand.
The input dimensions will be noted at 11 whihc is the attibutes that I am using as the deciding factor for the quality of wine.
I will not be using activation function for the output and the final activation function used for regression will be the mean squared error function.
Keras provides an option to compile the model using epochs and batch sizes for muliple loads to be send for compilation.
We will also be checking how the model performs by issuing the test data as a validation set and compiling the model.
X1 = X.iloc[:,0:11]
X1
T1 = df_all.iloc[:, 11:12]
T1
X1_train,X1_test,T1_train,T1_test = train_test_split(X1,T1,test_size = 0.2)
X1_train.shape
T1_train.shape
model1 = Sequential()
model1.add(Dense(11, input_dim=11, activation='relu'))
model1.add(Dense(9, activation='relu'))
model1.add(Dense(7, activation='relu'))
model1.add(Dense(5, activation='relu'))
model1.add(Dense(3, activation='relu'))
model1.add(Dense(1))
model1.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
start_time_2 = time.time()
history1 = model1.fit(X1_train, T1_train, validation_data = (X1_test,T1_test), epochs=100, batch_size=32)
end_time_2 = time.time()
print("Time taken to execute model 2 ==> {} seconds".format(end_time_2 - start_time_2))
A summary to show to give an overview of the model created and the number of trainable parameters among other info.
model1.summary()
T1_pred = model1.predict(X1_test)
T1_pred.shape
T1_pred = np.array(T1_pred)
# Rounding the values to determine the exact quality of wine
T1_pred = np.round(T1_pred)
Long Short-Term Memory units, or LSTMs, was proposed by the German researchers Sepp Hochreiter and Juergen Schmidhuber as a solution to the vanishing gradient problem. LSTMs help preserve the error that can be backpropagated through time and layers. By maintaining a more constant error, they allow recurrent nets to continue to learn over many time steps (over 1000), thereby opening a channel to link causes and effects remotely. This is one of the central challenges to machine learning and AI, since algorithms are frequently confronted by environments where reward signals are sparse and delayed, such as life itself. LSTMs contain information outside the normal flow of the recurrent network in a gated cell. Information can be stored in, written to, or read from a cell, much like data in a computer’s memory. The cell makes decisions about what to store, and when to allow reads, writes and erasures, via gates that open and close. Unlike the digital storage on computers, however, these gates are analog, implemented with element-wise multiplication by sigmoids, which are all in the range of 0-1. Analog has the advantage over digital of being differentiable, and therefore suitable for backpropagation.
I will be using Keras to build this model since it is much easier to build the model and finetune the hyperparameters
I will be using the LSTM layer as the primary layer where the neural network is trainied.
In order to ease the flow of data to the LSTM layer I'll be limiting the large input size into 256 splits.
The SpatialDropout will ensure that any nodes that have a probavility of 30 percent or lower will be ignored adn then sent to the LSTM
Applying a dropout with a 30 percent hyper parameter is sued to filter down the complex output further into a simpler output
Post processing will involve using the RELU activation function followed by a softmax activation application to finalise out classification model.
model_lstm = Sequential()
model_lstm.add(Embedding(input_dim = 6000, output_dim = 256))
model_lstm.add(SpatialDropout1D(0.3))
model_lstm.add(LSTM(256, dropout = 0.3, recurrent_dropout = 0.3))
model_lstm.add(Dense(256, activation = 'relu'))
model_lstm.add(Dropout(0.3))
model_lstm.add(Dense(2, activation = 'softmax'))
model_lstm.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
T2_pred = model_lstm.predict(X_test)
T2_pred.shape
T2_pred
#Converting predictions to label
pred2 = list()
for i in range(len(T2_pred)):
pred2.append(np.argmax(T2_pred[i]))
#Converting one hot encoded test label to label
test2 = list()
for i in range(len(T_test)):
test2.append(np.argmax(T_test[i]))
# pred2
# test2
start_time_3 = time.time()
history2 = model_lstm.fit(X_train, T_train,validation_data = (X_test,T_test), epochs=100, batch_size=64)
end_time_3 = time.time()
print("Time taken to execute model 3 ==> {} seconds".format(end_time_3 - start_time_3))
model_lstm.summary()
T2_pred = model_lstm.predict(X_test)
T2_pred.shape
T2_pred
#Converting predictions to label
pred2 = list()
for i in range(len(T2_pred)):
pred2.append(np.argmax(T2_pred[i]))
#Converting one hot encoded test label to label
test2 = list()
for i in range(len(T_test)):
test2.append(np.argmax(T_test[i]))
history.history
Applying simple feed forward networks to the classification model yeilded a very good result since the softmax fucntion at the last layer provides a very good activation function for classfication. We can see the accuracy for the model oscillate between 97 and 99 percent.
# IF GPU
# plt.plot(history.history['accuracy'])
# plt.plot(history.history['val_accuracy'])
# IF CPU
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
The loss for the testing model is seen to be very less compared to the training showing that the mode does better as the number of epochs keep rising. Whilke there are ocassinal spikes in the graph indicating the model has not performed that well onm an average we can see a stable low loss compared to the traning set.
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
The accuracy is calculated for the predictions made and we can see that the model created is highly accurate
a = accuracy_score(pred,test)
print('Accuracy is:', a*100)
Applying simple feed forward networks to the regression model yeilded a very average result since the least square error function at the last layer didn't prove to be the best that the model could do with a regression problem. We can see the accuracy for the model oscillate between 49 and 57 percent wihch is not a good indicator of this model's performance.
# IF GPU
# plt.plot(history1.history['accuracy'])
# plt.plot(history1.history['val_accuracy'])
# IF CPU
plt.plot(history1.history['acc'])
plt.plot(history1.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
As we saw from the previous graph it is a given that with a low accuracy the loss would be much higher than expected. We can clearly see that there is no overlap in the train and testing loss functions which impllies that the chances of hvaing a successful prediction are very low.
plt.plot(history1.history['loss'])
plt.plot(history1.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
The accuracy score for the regression model turned out to be very low due ot the low accuracy and high loss seen in the graphs above
b = accuracy_score(T1_test, T1_pred)
print('Accuracy is:', b*100)
LSTM proved to be a little more challenging to apply on the data which focused more on tuning the data. From the graph below we can see that although the training has shown good accuracy the test data doesn't seem to converge. This leaves us with a lower accuracy score in the testing data. But nonetheless the model shows good promise with a high testing accuracy oscillating between 95 and 97 percent.
# IF GPU
# plt.plot(history2.history['accuracy'])
# plt.plot(history2.history['val_accuracy'])
# IF CPU
plt.plot(history2.history['acc'])
plt.plot(history2.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
Of all the mdoels tested soi far, LSTM has shown the lowest stable loss function values on the test set. While it is certainly higher on the test set compared to the training set which can be seen in the accuracy score below, this nevertheless showcases that the predicts made by this model are good.
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()
The high accuracy and low loss form the last two grpahs show a very good accuracy score second only to the simple feed forward model used for classification
c = accuracy_score(test2, pred2)
print('Accuracy is:', c*100)
Discuss the challenges or somethat that you learned. If you have any suggestion about the assignment, you can write about it.
This has been one of the more interesting challenges to tackle since there is a lot of plug and play to get a highly accurate score on the predictions. This has not only taught me how to design neural models but also taught me how minute modifications to the network and sometimes a complete network overhaul can give a much better score on the predictions.
One of the challenges in this assignment was getting the underlying libraries to run on GPU's directly since I was testing the code directly on my personal; device since I have a CUDA supported graphics driver. The testing speed has been completely different with GPU's racing through the models much faster than the CPU. Setting up the GPU instance of tensor flow was the real challenge since there were a lot off dependencies on the CUDA SDK and its libraries and I chose Windows as my platform of choice since it was better supported.
[1] Keras Documentation, https://keras.io/
[2] Long Short Term Memory Model, TowardsDataScience.com, https://towardsdatascience.com/machine-learning-recurrent-neural-networks-and-long-short-term-memory-lstm-python-keras-example-86001ceaaebc
[3] Regression Using Keras, MachineLearningMastery.com, https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
[4] Long Shoret Term Memory Model, SkyMind.ai, https://skymind.ai/wiki/lstm
[5] TesnorFlow GPU documentation, TesnorFlow.org, https://www.tensorflow.org/install/gpu
Only well-written notebook will be graded. Please follow the structure and fill in as the other assignments.
| extra credit points | description |
|---|---|
| 1 | First structure (implementation, explanation, plot results, discussion of results) |
| 1 | Second structure (implementation, explanation, plot results, discussion of results) |
| 1 | Third structure (implementation, explanation, plot results, discussion of results) |
| 1 | Explaining and discussing the reason for the selection (Any relation to your data?) |
| Comparing the results, discuss or verify your choice |
The commands below are to showcase the hardware used in testing the models which include
!wmic cpu get name
!nvidia-smi
This is used ot check which device Keras will use to perform the neural network processing.
# If the output here is blank if it is running on CPU
import tensorflow as tf
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
As part of the extra credit I will be using the CPU and GPU shown above and calculate the time it takes to execute the models
Simple Feedforward Model (Classification) --> 11.90 seconds with 98.38% accuracy
Simple Feedforward Model (Regression) --> 18.20 seconds with 49.69% accuracy
Long Short Term Memory (Classification) --> 604.39 seconds with 96.54% accuracy
Simple Feedforward Model (Classification) --> 30.94 seconds with 97.76% accuracy
Simple Feedforward Model (Regression) --> 74.79 seconds with 56.46% accuracy
Long Short Term Memory (Classification) --> 139.56 seconds with 94.69% accuracy
We have come across a really interesting observation while performing our comparison